Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name option #147

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open

Name option #147

wants to merge 11 commits into from

Conversation

will-moore
Copy link
Member

@will-moore will-moore commented Jul 11, 2023

This PR adds an optional name_by argument with options of id (default behaviour) and name. It is needed for batch exporting many Images or Plates where we want the exported OME-Zarr image to have a useful name.

This PR has been used for all the omero-cli-zarr exports for ongoing IDR NGFF upgrade work:

When exporting from OMERO, we now adopt the naming convention of ID.ome.zarr or PlateName.ome.zarr instead of the previous ID.zarr.

If names contains square brackets [ ] then this can break writing to zarr (see errors below) so these are replaced by ( ).

To test:

$ omero zarr export Image:123 --name_by name
# will create image_name.ome.zarr

$ omero zarr export Plate:123 --name_by name
# will create plate_name.ome.zarr

default="id",
choices=["id", "name"],
help=(
"How to name the Image or Plate zarr. Default 'id' is [ID].zarr. "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That raises the question about the extension.
In both cases, the library is used to generate ome.zarr so in one case using .zarr and in the other case .ome.zarr is confusing

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think we've stated elsewhere:

  • strongly SHOULD (bordering on MUST): .zarr
  • SHOULD: .ome.zarr

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joshmoore You're suggesting we update the default behaviour to be [ID].ome.zarr instead of [ID].zarr?
Always using .ome.zarr (for name.ome.zarr or ID.ome.zarr) is more consistent, so I'm happy to do that, but that is more of a breaking change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking in the sense that you think current processes won't detect that something has already been exported and will re-export?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, just if you've got a workflow like:

$ omero zarr export Image:1
$ mv 1.zarr renamed_image.zarr

that workflow will need to change because 1.zarr will now be created as 1.ome.zarr

But that's not a major concern, so happy to make that change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I'm personally a bit up in the air. A benefit of .zarr filesets are that other stuff can be stored in them which is why I don't think we should ever require .ome.zarr. On the flip side, what we do now will likely be the model that other submitters to BIA tend to follow, so it does make sense to do what we think is most strategic (and update any existing workflows as necessary). All that being said, 💯 for doing whatever consistently.

@will-moore
Copy link
Member Author

I have reverted the removal of .pattern from image names above.
Just looking at how to handle various image names... I previously thought that names containing whitespace were causing errors, but it seems this is not always the case, since this works OK...

omero zarr export Image:5025553 --name_by name
Using session for [email protected]:4064. Idle timeout: 10 min. Current group: Public
Exporting to Tonsil 3.ome.zarr (0.4)
...

But exporting an Image with a more complex name, e.g. JL_120731_S6A [Well A-1; Field #1] https://idr.openmicroscopy.org/webclient/?show=image-1229801 fails, with an exception that can be reproduced as follows:

from zarr.storage import FSStore
from zarr.hierarchy import open_group
open_group(FSStore("JL_120731_S6A [Well A-1; Field #1]", mode="w"))

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/zarr/hierarchy.py", line 1465, in open_group
    return Group(store, read_only=read_only, cache_attrs=cache_attrs,
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/zarr/hierarchy.py", line 164, in __init__
    meta_bytes = store[mkey]
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/zarr/storage.py", line 1393, in __getitem__
    return self.map[key]
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/mapping.py", line 143, in __getitem__
    result = self.fs.cat(k)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/spec.py", line 826, in cat
    paths = self.expand_path(path, recursive=recursive)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/spec.py", line 1005, in expand_path
    out = self.expand_path([path], recursive, maxdepth)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/spec.py", line 1011, in expand_path
    bit = set(self.glob(p))
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/implementations/local.py", line 70, in glob
    return super().glob(path, **kwargs)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/spec.py", line 591, in glob
    pattern = re.compile(pattern.replace("=PLACEHOLDER=", ".*"))
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/re.py", line 252, in compile
    return _compile(pattern, flags)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/re.py", line 304, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/sre_compile.py", line 788, in compile
    p = sre_parse.parse(p, flags)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/sre_parse.py", line 955, in parse
    p = _parse_sub(source, state, flags & SRE_FLAG_VERBOSE, 0)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/sre_parse.py", line 444, in _parse_sub
    itemsappend(_parse(source, state, verbose, nested + 1,
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/sre_parse.py", line 599, in _parse
    raise source.error(msg, len(this) + 1 + len(that))
re.error: bad character range A-1 at position 58

I don't know a good way to make all names "safe" from these types of errors.

Another example that fails with a different Error has Image name plate1_1_013 [Well 1, Field 1 (Spot 1)]

$ omero zarr export Image:179693 --name_by name
Using session for [email protected]:4064. Idle timeout: 10 min. Current group: Public
Exporting to plate1_1_013 [Well 1, Field 1 (Spot 1)].ome.zarr (0.4)
Traceback (most recent call last):
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/mapping.py", line 143, in __getitem__
    result = self.fs.cat(k)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/spec.py", line 826, in cat
    paths = self.expand_path(path, recursive=recursive)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/spec.py", line 1005, in expand_path
    out = self.expand_path([path], recursive, maxdepth)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/spec.py", line 1031, in expand_path
    raise FileNotFoundError(path)
FileNotFoundError: ['/Users/wmoore/Desktop/ZARR/data/TEMO/plate1_1_013 [Well 1, Field 1 (Spot 1)].ome.zarr/.zgroup']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/zarr/storage.py", line 1393, in __getitem__
    return self.map[key]
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/fsspec/mapping.py", line 147, in __getitem__
    raise KeyError(key)
KeyError: '.zgroup'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/zarr/hierarchy.py", line 164, in __init__
    meta_bytes = store[mkey]
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/zarr/storage.py", line 1395, in __getitem__
    raise KeyError(key) from e
KeyError: '.zgroup'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/bin/omero", line 8, in <module>
    sys.exit(main())
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/omero/main.py", line 125, in main
    rv = omero.cli.argv()
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/omero/cli.py", line 1784, in argv
    cli.invoke(args[1:])
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/omero/cli.py", line 1222, in invoke
    stop = self.onecmd(line, previous_args)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/omero/cli.py", line 1299, in onecmd
    self.execute(line, previous_args)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/omero/cli.py", line 1381, in execute
    args.func(args)
  File "/Users/wmoore/Desktop/ZARR/omero-cli-zarr/src/omero_zarr/cli.py", line 125, in _wrapper
    return func(self, *args, **kwargs)
  File "/Users/wmoore/Desktop/ZARR/omero-cli-zarr/src/omero_zarr/cli.py", line 342, in export
    image_to_zarr(image, args)
  File "/Users/wmoore/Desktop/ZARR/omero-cli-zarr/src/omero_zarr/raw_pixels.py", line 56, in image_to_zarr
    root = open_group(store)
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/zarr/hierarchy.py", line 1465, in open_group
    return Group(store, read_only=read_only, cache_attrs=cache_attrs,
  File "/Users/wmoore/opt/anaconda3/envs/omeroweb2/lib/python3.9/site-packages/zarr/hierarchy.py", line 167, in __init__
    raise GroupNotFoundError(path)
zarr.errors.GroupNotFoundError: group not found at path ''

Same error with:

$ omero zarr export Image:3414011 --name_by name
Using session for [email protected]:4064. Idle timeout: 10 min. Current group: Public
Exporting to 10percent-Wt1-GFP-spheroid-MV.czi [0].ome.zarr (0.4)
...
zarr.errors.GroupNotFoundError: group not found at path ''

and

$ omero zarr export Image:9022301 --name_by name
Exporting to subpool-1_run-1_EXP-19-BQ3550 [Pos101].ome.zarr (0.4)
...
zarr.errors.GroupNotFoundError: group not found at path ''

But, removing the [ character from this name fixed the error, so it looks like it's being recognised as a regex, which is causing the errors above, particularly the re.error: bad character range A-1 at position 58?!

Other examples that work OK:

$ omero zarr export Image:1884807 --name_by name
Exporting to Centrin_PCNT_Cep215_20110506_Fri-1545_0_SIR_PRJ.dv.ome.zarr (0.4)
...
$ omero zarr export Image:4995043 --name_by name
Exporting to ExperimentB_No05_DMSO_11_10min__010.czi.ome.zarr (0.4)
...

@will-moore
Copy link
Member Author

So it looks like all names are OK except for those with [ and ] in them.
Not sure how to avoid those being recognised as broken regex without actually changing the name we want to write?

@will-moore
Copy link
Member Author

Replacing [] with () in names now.

@dominikl
Copy link
Member

👍 Looks good to me. I've used the build from this branch a few times already for the NGFF conversion/export work.

name = os.path.join(target_dir, "%s.zarr" % plate.id)
name_by = args.name_by

if name_by == "name":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two quick notes:

  • this code is duplicated and could be made a method for possible sub-classing
  • I can live with this split in naming schemes, but it could be a surprise

@will-moore
Copy link
Member Author

@joshmoore I reduced duplication by creating def get_zarr_name(obj, args)

I also noticed that we need the name for polygon/masks export, but supporting the --name_by name argument there could be quite a bit of work, so probably not worth it until we know it's needed. I fixed the .zarr -> .ome.zarr name at least and updated README

@will-moore will-moore mentioned this pull request Dec 13, 2023
@will-moore
Copy link
Member Author

Anything else needed here?

@joshmoore
Copy link
Member

Nothing outstanding from my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants